The Impact of Determinism on Learning Atari 2600 Games
نویسندگان
چکیده
Pseudo-random number generation on the Atari 2600 was commonly accomplished using a Linear Feedback Shift Register (LFSR). One drawback was that the initial seed for the LFSR had to be hard-coded into the ROM. To overcome this constraint, programmers sampled from the LFSR once per frame, including title and end screens. Since a human player will have some random amount of delay between seeing the title screen and starting to play, the LFSR state was effectively randomized at the beginning of the game despite the hard-coded seed. Other games used the player’s actions as a source of randomness. Notable pseudo-random games include Adventure in which a bat randomly steals and hides items around the game world and River Raid which used randomness to make enemy movements less predictable. Relying on the player to provide a source of randomness is not sufficient for computer controlled agents which are capable of memorizing and repeating pre-determined sequences of actions. Ideally, the games themselves would provide stochasticity generated from an external source such as the CPU clock. In practice, this was not an option presented by the hardware. Atari games are deterministic given a fixed policy leading to a set sequence of actions. This article discusses different approaches for adding stochasticity to Atari games and examines how effective each approach is at derailing an agent known to memorize action sequences. Additionally it is the authors’ hope that this article will spark discussion in the community over the following questions:
منابع مشابه
Pairwise Relative Offset Features for Atari 2600 Games
We introduce a novel feature set for reinforcement learning in visual domains (e.g. video games) designed to capture pairwise, position-invariant, spatial relationships between objects on the screen. The feature set is simple to implement and computationally practical, but nevertheless allows for substantial improvement over existing baselines in a wide variety of Atari 2600 games. In the most ...
متن کاملInvestigating Contingency Awareness Using Atari 2600 Games
Contingency awareness is the recognition that some aspects of a future observation are under an agent’s control while others are solely determined by the environment. This paper explores the idea of contingency awareness in reinforcement learning using the platform of Atari 2600 games. We introduce a technique for accurately identifying contingent regions and describe how to exploit this knowle...
متن کاملParameter Selection for the Deep Q-Learning Algorithm
Over the last several years deep learning algorithms have met with dramatic successes across a wide range of application areas. The recently introduced deep Q-learning algorithm represents the first convincing combination of deep learning with reinforcement learning. The algorithm is able to learn policies for Atari 2600 games that approach or exceed human performance. The work presented here i...
متن کاملGames in Just Minutes
Machine learning algorithms for controlling devices will need to learn very quickly, with very few trials. Such a goal can be attained with concepts borrowed from continental philosophy and formalized using tools from the mathematical theory of categories. Illustrations of this approach are presented on a cyberphysical system: the slot car game, and also on Atari 2600 games.
متن کاملGames in Just Minutes
Machine learning algorithms for controlling devices will need to learn quickly, with few trials. Such a goal can be attained with concepts borrowed from continental philosophy and formalized using tools from the mathematical theory of categories. Illustrations of this approach are presented on a cyberphysical system: the slot car game, and also on Atari 2600 games.
متن کامل